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SYMMETRIC ORTHOGONAL TENSOR DECOMPOSITION IS 

TRIVIAL* 

TAMARA G. KOLDAt 


Abstract. We consider the problem of decomposing a real-valued symmetric tensor as the 
sum of outer products of real-valued, pairwise orthogonal vectors. Such decompositions do not 
generally exist, but we show that some symmetric tensor decomposition problems can be converted 
to orthogonal problems following the whitening procedure proposed by Anandkumar et al. (2012). 
If an orthogonal decomposition of an m -way n-dimensional symmetric tensor exists, we propose a 
novel method to compute it that reduces to an n x n symmetric matrix eigenproblem. We provide 
numerical results demonstrating the effectiveness of the method. 


1. Introduction. Let A be an ra-way n-dimensional real-valued symmetric ten¬ 
sor. Let A represent an ra-way, n-dimension symmetric tensor. Given a real-valued 
vector x of length n, we let x m denote the m-way, n-dimensional symmetric outer 
product tensor such that ^ ^ • • • Xi m . Comon et al. [2] showed there 

exists a decomposition of the form 


A = (1.1) 

fc=1 

where A = [Ai ■ ■ ■ A P ] T £ R p and X = [xi • • ■ x p ] £ R" xp ; see Figure 1.1. Without loss 
of generality, we assume each x^. has unit norm, i.e., ||xfc|| 2 = 1. The least value p such 
that (1.1) holds is called the symmetric tensor rank. Finding real-valued symmetric 
tensor decompositions has been the topic of several recent papers, e.g., [4]. 
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Fig. 1.1. Symmetric tensor factorization for m = 3. 

If we can discover an X with orthogonal columns, i.e., X T X = I p , then we say that 
A has an orthogonal symmetric tensor decomposition. Generally, orthogonal decom¬ 
positions do not exist; Robeva [7] classifies the tensors that have such decompositions. 
Nevertheless, we consider the problem of how to compute orthogonal symmetric ten¬ 
sor decompositions, as has been recently considered by Anandkumar et al. [1]. They 
show that the pairs (Afc,x*,) are Z-eigenpairs of A for k = 1 and propose 

solving the problem via an iterative power method. We show that this problem can 
instead be solved via a symmetric matrix eigenproblem on an n x n matrix, including 
determining the rank. This is can be interpreted as a special case of the simultaneous 
matrix diagonalization approach proposed by De Lathauwer [3]. 
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Although orthogonal symmetric tensor decompositions do not generally exist, 
Anandkumar et al. [1] showed that certain symmetric tensor decompositions problems 
can be transformed via whitening to orthogonal problems. We generalize their results 
to show that such a transformation is possible whenever X has full column rank 
and there exists a linear combination of two-dimensional slices of A that is positive 
definite. The second condition is always satisfied, for example, if A is positive definite. 

2 . Background. A tensor is a multidimensional array. The number of ways or 
modes is called the order of a tensor. For example, a matrix is a tensor of order two. 
Tensors of order three or greater are called higher-order tensors. 

We use the notation I„ to denote the n x n identity matrix. 

We use the term generic to mean with probability one. For instance, it is well 
known that a random n x n matrix generically has rank n. 

2.1. Symmetry. A tensor is symmetric if its entries do not change under per¬ 
mutation of the indices. Formally, we let 7r(m) denote the set of permutations of 
length m. We say a real-valued w-way n-dimensional tensor A is symmetric [2] if 

= a h-i m for all *i,. ..,i m S {1,. ..,n} and p e 7r(m). 

2 . 2 . Tensor-vector Products. The tensor-vector product Ax ( ' m ~ 1 ' > produces 
a vector in M" such that 

n 

(Ax™ -1 )^ = ^2 a ii-i m Xia ■ ■ ■ X im for i\ £ { 1, . . . , 71 } . 

The tensor-vector product Ax m produces a scalar such that 

n 

Ax m = X T (Ax™- 1 ) = ^2 a h...im X il ' ' ' X im ■ 

A symmetric tensor A is positive definite if Ax m > 0 for all x/0. 

2 . 3 . Tensor-matrix Products. Let V be an p x n matrix. Then the tensor- 
matrix product A(V,..., V) indicates multiplication of the tensor A in each mode 
by the matrix V. The result is a symmetric tensor that is the same order as A but 
now p-dimensional such that 

n 

(A(V,..., V))^ = ^2 a h-i m v iijx for 

2 . 4 . Tensor Rank. Recall that the rank of a tensor A, denoted rank(A), is 
the smallest number of rank-one tensors that sums to the original tensor [5]. The 
symmetric tensor rank, denoted symrank(A), is the smallest number of symmetric 
rank-one tensors that sums to the original tensor [2]. In the case of the tensor A in 
(1.1) with X having orthogonal columns, it is easy to show that the symmetric tensor 
rank of A is equal to the tensor rank of A which is equal to the matrix rank of X 
which is equal to p , i.e., 

symrank(A) = rank(A) = rank(X) = p, 


via unfolding arguments. 
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3. Orthogonal Symmetric Decomposition. Given A, our goal is to find A 
and X that satisfies (1.1), under the assumption that X is known to have orthogonal 
columns. 

We define B to be an arbitrary linear combination of slices of A: 

p 

B= ]T J S i3 -i m A(:,: ) i 3 ,... J i r „)=^ < 7 fc x fc xJ = XSX T (3.1) 

. ,im k—1 

where the ^-values define the linear combination and 

crk = A fc E Pi 3 ... im Xi 3 k ■ ■ • Xi m k for k = l,...,p. 

13 

Observe that ((Jfc,Xk) is an eigenpair of B. 

3.1. Generic case. If X is a generic matrix (with orthogonal columns) and the 
/3-values are arbitrary, then A& ^ 0 generically implies Ok 0 for k = 1,... ,p. It 
follows that the tensor rank of A is generically equal to the matrix rank of B, i.e., 

rank(A) = rank(B). 

Additionally, the a -values are generically distinct, so the eigenvectors of B are 
unambiguous and equal to the vectors in the decomposition of A. Therefore, we can 
compute X from B via an eigenvalue decomposition and recover the A-values via 

Afc = Ax“ for k=l,...,p. (3.2) 

3.2. Non-generic Case. In the generic case, the matrix rank of B is equal to 
the symmetric tensor rank of A and the nonzero eigenvalues of B are distinct. In 
order to guard against the non-generic case, randomly project the tensor as follows. 
Let V £ R nx " be a random orthonormal matrix. Then compute, 

A = A(V, ..., V) = E A fc (Vx fc ) m = E X ^k- 

k =1 fc=1 

Apply the procedure outlined above to obtain the decomposition of A in terms of A 
and X. Then the decomposition of A is given by 

A = A and X = V T X. 

3.3. Algorithm. The algorithm for computing the orthogonal symmetric de¬ 
composition of A using optional randomization is given in Algorithm 1. 

4. Whitening. Although most tensors do not have symmetric decompositions, 
Anandkumar et al. [1] show how whitening may be used in a special case of nonorthog- 
onal symmetric tensor decomposition. We generalize their result. 

Given A, our goal is to find A and X that satisfies (1.1), under the assumption 
that X is known to have full column rank but may not have orthogonal columns. 
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Algorithm 1 Orthogonal Symmetric Decomposition 

Input: Let A be a symmetric m- way, n-dimensional real-valued tensor that is known 
to have an orthogonal symmetric tensor decomposition. 

1: if apply randomization then 

2 : V 7— random n x n orthonormal matrix 

3: else 

4: V «- I„ 

5: end if 

6 : A <- A(V, • • •, V) 

7: 0 4— arbitrary (m — 2)-way real-valued tensor of dimension n 
8 : B i ■> *3> • ■ • > im) 

9: { (7k, Xfc }k=i t— eigenpairs of B with cr*, ^ 0 
10: for k = 1,... ,p do 
11: X k < V ' X;, 

12: Afc -f- Ax.™ 

13: end for 


4.1. Transformation to orthogonal problem. Let C be an arbitrary linear 
combination of slices: 


^ ^ ^ ^ 3 ) • • • , i m ), ( 4 - 1 ) 

*3,■■■dm 

where the 7 -values define the linear combination. If C is positive semi-definite (p.s.d.) 
and has the same rank as A, then we can apply whitening as follows. Let 

udu t = c 

be the “skinny” eigendecomposition of C where U is an orthogonal matrix of size 
n x p and D is a diagonal matrix of size p x p. Define the whitening matrix as 

W = D~ 1 / 2 U t so that WCW T = (WUD 1/2 )(WUD 1/2 ) T = l p . 

We use W to whiten the tensor as 

p p 

A = A( W,.. ., W) = Xl = Wxk = E 

fe=l fe=l 

Now, X = WX € R :pxp is a matrix with orthogonal columns. Moreover, the size of 
the problem is reduced because A is an m-way 71 -dimensional tensor. 

We compute the orthogonal tensor decomposition of A via the procedure outlined 
above to get A and X. The final decomposition is given by 

A = A and X = W t X. 

Here = UD 1 / 2 represents the psuedoinverse of W. 

4.2. Failure of the Method. If the algorithm cannot find a p.s.d. matrix C, 
then the algorithm fails. Additionally, if rank(C) < p , then the algorithm has a soft 
failure. 
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Algorithm 2 Whitening for Orthogonal Symmetric Decomposition 
Input: Let A be an m-way, n-dimensional real-valued tensor. 

1: if apply random orthogonal matrix then 
2: V t— random n x n orthonormal matrix 

3: else 

4: V^I n 

5: end if 

6: A <- -A(V, . . . , V) 

7: repeat 

8 : 7 <— arbitrary (to — 2 )-way real-valued tensor of dimension n 

9- c 4— ■) * *3) • ■ ■ J *m) 

10: until C is p.s.d. or exit with failure 
11 : UDU t •<— “skinny” eigendecomposition of C 
12: W <- D^ 1/2 U T 
13: A <- A(W, . . . , W) 

14: /3 <— arbitrary (to — 2)-way real-valued tensor of same dimension as A 
15: B t ■! * 3 ) • • ■ > *m) 

16: { < 7 fc, Xfe }£_. •<— eigenpairs of B with cr fc ^ 0 
17: for fc=l,...,pdo 
18: X fe <- Y T UD 1/2 X fc 

19: A fc t- ilx“ 

20: end for 


4.3. Algorithm. The algorithm for computing the symmetric decomposition of 
A using whitening is given in Algorithm 2 . 

5. Numerical Results. We consider the results of applying the algorithms to 
numerical examples. All experiments are done in MATLAB, Version R2014b. Nu¬ 
merically, we say 

• an eigenvalue a of B is nonzero if \a\ > 10 —10 , 

• C is p.s.d. if its smallest eigenvalue satisfies d > —10 —10 , and 

• the skinny decomposition of C uses only eigenvalues such that d > 10 -10 . 
We choose both fj and 7 values from C/[0,1], i.e., uniform random on the interval 
[0,1] and then normalize so the values sum to one. Random orthogonal matrices are 
generated via the MATLAB code RANDORTHMAT by Olef Shilon. 

We generate artificial data as follows. For a given A* £ and X* £ R nxp , the 
noise-free data tensor is given by 

A * = £ A fe(xfc) m - (5-1) 

fc=1 

The data tensor A may also be contaminated by noise as controlled by the parameter 
77 > 0 , i.e., 

A = A* + 77 ^|jl JSf where ~ Af( 0,1). (5.2) 

Here N is a noise tensor such that each element is drawn from a normal distribution, 
i.e., ~ Af( 0,1). The parameters m, n, p control the size of the problem. 

In our randomized experiments, we consider three sizes: 
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• m = 3, n = 4,p = 2; 

• to = 4, n = 25, p = 3; and 

• to = 6, tt = 6,p = 4. 

For each size, we also consider two noise levels: 77 € {0,0.01}, i.e., no noise and a 
small amount of noise. 

The output of each run is a rank p, a weight vector A, and a matrix X. The 
relative error measures the proportion of the observed data that is explained by the 
model, i.e., 


relative error 


k=1 


M 


In the case of no noise, the ideal relative error is zero; otherwise, we hope for something 
near the noise level, i.e., 77 . 

To compare the recovered solution A and X with the true solution A* and X*, we 
compute the solution score as follows. Without loss of generality, we assume both X 
and X* have normalized columns. (If ||Xfc ||2 7 ^ 1, then we rescale A& = A k \/||xfc|| and 
Xfc = Xfc/||xfc||.) There is a permutation ambiguity, but we permute the computed 
solution so as to maximize the following score: 


solution score 


1 

P 


E 


k =1 


l Afc A fcl ^ | X T X * 
max{|Afc|, |A£|}/ fe 


A solution score of 1 indicates a perfect match. If X has more columns than X*, we 
choose the p columns that maximize the score. 

5.1. Orthogonal Example Showing Impact of /3-values. We discuss why 
we recommend a linear combination of slices instead of a single slice. Consider the 
following example. Let to = 3, n = 3, and p = 3. Further, supposed X* = I„ and A* 
is an arbitrary vector. Assume no noise so that A = A*. Each slide of the tensor A 
is a rank-1 matrix. For instance, 


A(:,:,l) 


Ai 0 0 

0 0 0 
0 0 0 


If we only select the first slice, which corresponds to j3 = [10 0], will not yield a 
matrix B that has rank equal to p. But, using a random linear combination remedies 
this problem. Alternatively, using the randomization to make the problem generic 
will also correct the problem. 

5.2. Random Orthogonal Examples. In this case, we generate tensors such 
that X* is a random orthogonal matrix and A* is the all ones vector. (Note that 
a matrix that had repeated eigenvalues would not have a unique factorization, but 
tensors with repeated eigenvalues are not impacted in the same way.) We apply 
Algorithm 1 with no randomization (V = I n ). The results are shown in Table 5.1. 
We generated 100 random instances for each size, and then we ran the code 10 timers 
per instance (each run uses a different random choice for (3). In the noise-free case 
(77 = 0 ), the method works perfectly: the rank is perfectly predicted and the exact 
solution is found. In the noisy case (77 = 0.1), the method is less reliable. The 
rank is never predicted correctly; instead, the B matrix is nearly always full rank. 
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Nevertheless, the relative error is usually less than IO77 and the solution score is nearly 
always > 0.99. So, we can presumably threshold the small A-values in the noisy cases 
to recover a good solution. 


m 

Size 

n 

P 

Rank 

= P 

No noise 77 = 0 
Rel. Error < le-10 

Soln. 

Score >0.99 

3 

4 

2 

1000 

100 

2.000 

1000 100 

0.0000 

1000 

100 

1.0000 

4 

25 

3 

1000 

100 

3.000 

1000 100 

0.0000 

1000 

100 

1.0000 

6 

6 

4 

1000 

100 

4.000 

1000 100 

0.0000 

1000 

100 

1.0000 


Size 





Noise 77 

= 0.01 




m 

n 

P 

Rank 

= P 

Rel. Error <0.1 

Soln. Score > 0.99 

3 

4 

2 

0 

0 

4.000 

921 100 

0.0428 

925 

100 

0.9903 

4 

25 

3 

0 

0 

24.999 

818 100 

0.0716 

864 

100 

0.9856 

6 

6 

4 

0 

0 

6.000 

393 92 

0.1916 

498 

99 

0.9530 


Table 5.1 


Random orthogonal examples. We create 100 random instances and run the method 10 times 
per instance for a total of 1000 runs. For each metric, we report the total number of runs where the 
metric meets the desired criteria, the number of instances where at least one run meets the desired 
criteria, and the mean value of the metric. 


5.3. Non-orthogonal Example. In Example 5.5(i) of [6], Nie considers an 
method for determining the rank of a tensor. The example tensor is of order m = 4 
and defined by 


'676' 

196 


0 

3/VU 


0.00 

0.80' 

and X* = 

1/V26 

2/y/U 


0.20 

0.53 


_-5/V26 

-1//M 


—0.98 

-0.27 


The matrix X* is not orthogonal but is full column rank. We apply algorithm Algo¬ 
rithm 2 one hundred times. We do not apply randomization (i.e., V = I„). For every 
run, the predicted rank is 2 and the solution score is 1 (perfect match). The average 
number of attempts (i.e., choosing a random set of 7-values) to find a p.s.d. C is 1.26, 
and the maximum is 4. 

5.4. Random Non-orthogonal Examples. In this case, we generate tensors 
such that X* comes from a matrix with entries drawn from the standard norm dis¬ 
tribution whose columns are normalized (i.e., ||||2 = 1 for k = 1,... ,p) and A* is 
the all ones vector. We apply Algorithm 2 with no randomization (V = I n ). The 
method fails is it cannot find a p.s.d. C after 100 attempts. The results are shown in 
Table 5.2. 

In the case of no noise (77 = 0), the method is surprising effective. Every problem is 
solved exactly for the even-order tensors (m = 4 and m = 6), which are constructed so 
that they are positive definite, guaranteeing that a p.s.d. C exists. For the odd-ordered 
tensor (m = 3), the method is able to find a p.s.d. C for 77 out of 100 instances. When 
it successfully finds the transformation, the problem is solved exactly. 

For the noisy case (77 = 0.01), the impact is dramatic. In the smallest example 
(771 = 3,77 = 4 ,p = 2), only 32 instances can find a p.s.d. C matrix, and only 10 
instances have a solution score of 0.99 or higher. For the case m = 4, n = 25 ,p = 3, the 
algorithm fails to find a p.s.d. C in every instance. For the case m = 6, n = 6,p = 4, 
the algorithm find a p.s.d. C in a handful of instances but ultimately fails to solve 
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the problem. We hypothesize that the problem stems from the fact that the noisy 
version of tensor has a rank that is higher than p , so the X that corresponds to the 
noisy tensor does not have full column rank. 


Size 

m n p 

p.s.d. C? 

No n 
Rank = p 

oise r/ = 0 

Rel. Err. < 10” 10 

Soln. Sc. > 0.99 

3 4 2 

4 25 3 

6 6 4 

700 77 11.6 

1000 100 2.5 

1000 100 4.4 

700 77 2.0 

1000 100 3.0 
1000 100 4.0 

700 77 0.0 

1000 100 0.0 

1000 100 0.0 

700 77 1.0 

1000 100 1.0 

1000 100 1.0 

Size 

m n p 

p.s.d. C? 

Noist 
Rank = p 

rj = 0.01 

Rel. Err. <0.1 

Soln. Sc. > 0.99 

3 4 2 

4 25 3 

6 6 4 

238 32 20.1 

0 0 

256 35 16.9 

0 0 4.0 

0 0 

0 0 6.0 

76 22 0.4 

0 0 

0 0 16.8 

17 10 0.8 

0 0 

0 0 0.4 


Table 5.2 

Random non-orthogonal examples. We create 100 random instances and run the method 10 
times per instance for a total of 1000 runs. For each metric, we report the total number of runs 
where the metric meets the desired criteria, the number of instances where at least one run meets 
the desired criteria, and the mean value of the metric for all successful attempts. For the p.s.d. 
C column, the final number is the mean number of attempts needed to find a p.s.d. C whenever it 
occurs. 


6. Conclusions. If a symmetric tensor A is known to have a factor matrix X 
with orthogonal columns, we show that it is possible to solve the symmetric orthogo¬ 
nal tensor decomposition algorithm via a straightforward matrix eigenproblem. The 
method appears to be effective even in the presence of a small amount of noise. This is 
an improvement over previous work [1] that proposed solving the problem iteratively 
using the tensor eigenvalue power method and deflation. 

We also consider the application of whitening as proposed by [1] in the case where 
a symmetric tensor A is known to have a factor matrix X that is full rank, but does 
not have orthogonal columns. In the noise-free case, the methods works extremely 
well. In the case that a small amount of noise is added, however, the current method 
is much less effective. Improving performance in that regime is a potential topic of 
future study. 
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